2 ◾ Bioinformatics
base pair of guanine (C/G)) as shown in Figure 1.1. Adenine and thymine form two hydro-
gen bonds (weak bond), while cytosine and guanine form three hydrogen bonds (strong
bond). Those base pairings are specific so that a sequence of a strand is predicted from the
other one. The length of a DNA sequence is given in base pair (bp), kilobase pair (kbp),
or megabase pair (Mbp). The RNA exists in a single strand; however, it sometimes forms
double-stranded secondary structure with itself to perform specific function.
The genome of an organism is the book of life for that organism. It determines the living
aspects and biological activities of cells. A genome contains coding regions known as genes
that carry information for protein synthesis. Genes are transcribed into messenger RNA
(mRNA), which is translated into proteins and the proteins control most of the biological
processes in the living organisms.
A gene consists of coding regions, non-coding regions, and a regulatory region. The cod-
ing regions in the eukaryotic genes are not continuous, but non-coding sequences (called
introns) are found between the coding sequences (called exons). These introns are removed
from the transcribed transcripts before protein translation, leaving only the exons which
form the coding region called the open reading frame (ORF). Each eukaryotic gene has its
own regulatory region that controls its expression. In prokaryotic cells, a group of genes,
called an operon, are regulated by a single regulatory region. The viruses, which fall in
the margin between living organisms and chemical particles, function and replicate only
inside host cells by using the host cells machineries such as ribosomes to create structural
and non-structural proteins of viruses and to replicate to create new virions.
N
N
N
N
N
N
N
N
N
N
Guanine
Cytosine
Thymine
Adenine
N
N
N
N
O
O
O
O
N
H
H
H
H
H
H
H
H
FIGURE 1.1 Base pairing and hydrogen bonds between pairs of the DNA nucleotides.